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METHOD FOR MAPPING FACIAL ANIMATION VALUES TO HEAD MESH POSITIONS 

BACKGROUND OF THE INVENTION 

The present invention relates to head animation, and more particularly, to generating an 
5 animated three-dimensional video head based on two-dimensional video images. 

Virtual spaces filled with avatars are an attractive way to allow for the experience of a 
shared environment. However, animation of a photo-realistic avatar generally requires intensive 
graphic processes, particularly for rendering facial features. 

Accordingly, there exists a significant need for improved rendering of facial features. 
10 The present invention satisfies this need. 

fi SUMMARY OF THE INVENTION 

l§ The present invention provides a technique for translating an animation vector to a target 

mix vector. In the method, a calibration vector is generated and the animation vector is mapped 
U 5 to the target mix vector using the calibration vector. 

q Other features and advantages of the present invention should be apparent from the 

following description of the preferred embodiments taken in conjunction with the accompanying 
drawings, which illustrate, by way of example, the principles of the invention. 



§ 
hi 



20 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic flow diagram showing a technique for translating an animation 
vector to a target mix vector, according with the invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
25 The present invention provides a technique for translating an animation vector to a target 

mix vector. 

With reference to FIG. 1 , the animation of an avatar is defined by a set of targets. A 
mapping algorithm provides the translation from animation vector 12 to target mix vector. 

The animation vector is the abstracted sensing result. It is the most compact 
30 representation of the facial expression as determined by audio-visual sensing. 
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By definition, the animation vector is zero for the neutral expression. 

The target mix vector describes the contribution of each individual target to the current 
expression. 

Different mapping algorithms may be used. Their common goal is to provide a 
reasonable interpolation between the points in animation space associated with the targets. 
Each mapping algorithm is exactly defined by a set of parameters. The parameters vary from 
algorithm to algorithm. 

Calibration may be performed by multiplying the target mix vector with a diagonal matrix. 
Since the matrix is diagonal, it is henceforth referred to as the calibration vector. 

An overview of the translation from animation vector to target mix vector follows. 
Be a the animation vector of dimension N a (number of animation values), g the target mix 

vector of dimension M (number of independent targets) and p x ,...,p L the parameters of 
mapping algorithm FQ , then 



(1) g = F(a, Pl ,...,p L ) 



The calibrated target mix vector is obtained by multiplying with the diagonal matrix 
defined by the calibration vector c: 



(2) 



8, = 



' M J 
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Further, be t l9 „.. 9 t M \he parameterization of the targets associated with the components 
of the target mix vector and / 0 the parameterization of the neutral model, then the 
parameterization of the current expression can be obtained by a simple matrix multiplication: 

(3) f = fc-/ 0 U-U •.. **-'o)x+Jo 

The matrix (f , ~t Q { 2 - / 0 ... t_ M ~h) is referred to as the target matrix r . 

A description of the mapping algorithm follows. Every target f . is associated with an 
animation vector a t . The target and animation vectors are connected by this deliberate 
association only. This will become obvious in the formulas, where targets enter only as the /-th 
unity vector e t representing the target mix vector that results in exactly that target. The 

parameterization of the target f .is not relevant for computation of the mapping parameters. 
(This means that is does not matter if the target is defined by vertex positions, morph link 
positions or muscle tensions.) 

The animation vector can be set manually or it can be derived from a reference model 
with targets, if the model is equipped with ground truth anchors that enable the application of the 
tracking algorithm to the model and it's deformations AND if the reference model implements all 
needed targets. The reference model must have a human geometry, since the purpose of the 
ground truth anchors is to simulate tracking on the model. Manual editing is necessary if the 
animation vector contains elements that cannot be derived from visual sensing, such as lip 
synch animation values. 

The mapping is basically a multidimensional interpolation in a between the target points. 
The mapping parameters are determined by minimizing the error in reproducing the target 
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points. Depending on the mapping algorithm, perfect reproduction of the target points may not 
be possible. 

The parameters p l9 ... 9 p L of the mapping are determined by solving the set of equations 



8 M 2 

(4) ^Th-F{a i9Pl ^p L )\\ =0, V/e[U] 



Targets can be divided into independent groups of targets, such as eye region targets 
and mouth region targets. Different mapping algorithms can be applied to the different groups to 
achieve more flexibility. 

A description of types of mapping algorithms follow. The simplest mapping algorithm is 
the linear mapping: 

(5) F(a,P) = P-a 



The parameter matrix is determined by solving the equation 
(6) P.fe a 2 ... a M ) = \ 

using singular value decomposition. If N a < M , the equation (6) is overdetermined and 
SVD will return the "solution" that satisfies eq. (4). If N a > M , the equation is underdetermined 
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and the SVD solution will be the vector with the smallest norm that satisfies equation (6). SVD is 
described in great detail in "Numerical Recipes". 

The linear method is internally referred to as the "matrix method" because of the form of 
the algorithm parameters. 

A more general mapping is achieved by using a set of basis functions as input. 
Obviously, the linear method is a special case of this more general method. 



(7) F(q,P) = P-B(a) 

The solution is analog to the solution of the linear problem. Since the number of basis 
functions is independent of the number of animation values N a , it is always possible to choose 
exactly M functions, so that the system is neither over- or underdetermined: 

(8) £ = B{a 2 ) ... B_{a M )Y 



The basis functions can be chosen manually by carefully analyzing the animation 
vectors of the participating targets. This is the currently deployed method. The internal name is 
"matrix2" because it is an extension of the linear method. 

The following set of basis functions and targets may be used. The basis functions are 
commonly referred to as "animation tag expressions". 



Eye/eyebrow group basis functions (formulated in UPN) 
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eyeOpen, eyeAsym, - , | - 
eyeOpen, eyeAsym, + , | - 
eyeOpen , | + 

eyeBrowRaise , eyeBrowAsym, - , | + 
eyeBrowRaise, eyeBrowAsym, + , |+ 
eyeBrowRaise, | - 

Eye/eyebrow group targets: 



0 MTEyeCloseR 



MTEyeCloseL 

MTEyeOpenWide 
MTBrowRaiseR 



!bst MTBrowRaiseL 
15 MTBrowFurrow 



Mouth group basis functions 



lipDistance,mouthVertPos, - 
20 lipDistance / mouthVertPos,+ / | + 

lipDistance/tnouthVertPos, + , |- 
mouthwidth, | - 
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mouthWidth, j + 
mouthCornerUp, | + 
mouthCornerUp, | - 
mouthHorizPos, 0 . 5 ,mouthRot , 
mouthHori zPos ,0.5, mouthRot , 
visemeB 
visemeF 
visemeS 

Mouth group targets: 

MTMouthAh 
MTMouthDisgust 

MTMouthDown 

MTMouthOh 

MTMouthEe 

MTMou t hSmi 1 eOpen 

MTMouthFrown 

MTMouthPullL 

MTMouthPullR 

MTMouthB 

MTMOUthF 
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MTMouthS 



Each basis function is designed to best match to a specific target. The order in the table 
above represents that match. It is very tedious to design a basis function manually such that it 
only responds when the associated target is acted and not responds when any other target is 
acted. Off-diagonal elements of the P matrix lead to corrections and decouple the targets such 
that this desired behavior is achieved. 

The target matrix, calibration matrix and mapping parameter matrix can be combined 
into one matrix by simple multiplication, which can be done ahead of time: 



O) 



t = Z-g+t 0 = T. 



P-B(a) = D-B(a) + t 0 



"M J 



D = T 



"M J 



The decoder matrix D , offset vector (neutral target) t_ Q and definition of basis functions 
B(a) are the parameters of the animation decoder used in the various players. 

A description of radial/angular basis function mapping follows. The basis functions can 
also be a set of automatically determined functions such as radial basis functions. Certain 
properties of the mapping algorithm can be designed into the basis functions. A desirable 
property of the algorithm is scale linearity: 
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(10) F{Aa) = XF{a) 



This means that the "amplitude" of an expression is translated linearly to the model. It 
can be obtained by using basis functions of the form 



(11) b ; {a) 







r 








l-l 






a 
















a I 




INI 
















) 



Because of their linear or locally linear behavior, the following functions are useful for 



(12) b{&) = & 



(13) b{&) = -MC\m{a&) 
a 



(14) b{&) 



& 



(13) and (14) can be locally approximated by (12) and have interesting saturation 
characteristics. The parameter a determines the localization of these basis functions. 

All mapping algorithms are somewhat arbitrary because of their nature as interpolation 
algorithms. The way interpolation is done between targets is determined by the choice of basis 
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functions. It seems to be reasonable to shoot for the algorithm that delivers the most linear 
interpolation possible while still reproducing all targets. 

If the basis functions are determined automatically, it is easily possible to add 
£ dependent targets that are created by linear superposition of independent targets. This 
enables one to have more control over the interpolation process by providing additional support 
points. Eq. (4) is then generalized to: 



< 15 ) ^S^-ffe^P-A) =0 Vye[U]; g^e^i^M 



Each dependent target is defined by its animation vector a t and target mix vector g, 

with / > M , which defines the superposition of independent targets. Eq. (8) has to be modified 
to yield the solution to this more general problem: 



| (16) P = (l } g 2 ... g„ + J-(S(&) *(&) - B{a M+K y 

H 
15 

A description follows of code to implement the animation technique outlined before. The 
implementation consists of authoring and player parts. The description covers the authoring part 
only up to and including the container for player decoder parameters. The code can be 
separated into classes and functions that are directly used by applications and classes that are 

20 only used internally. Throughout the descriptions, indices to either targets or animation values 
are replaced by string annotations (names). This allows for a more flexible configuration. 
Example: Instead of referring to target 3 or animation value 5, names like "MTMouthAh" or 
"lipDistance" are used in the interface. The actual indices may vary from object to object. 
Internally the names are used to build index lookup tables to perform the various linear algebra 

25 operations efficiently. 
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A description of the interface follows: 

au t __Key f r ameDa t a 

A pure container class to store target parameterizations (/. ) and ground truth data for 

the purpose of computing animation vectors. Contains additional annotation information, such 
as the meaning of certain model parameters and names of animation values. 

This class serves as input for the computation of the mapping parameters and for the 
computation of the decoder matrix. 

aut__LinTargetMapper 

A facade for the computation of a linear or manual basis function mapping. Contains 
necessary configuration parameters to define basis functions and participating targets. 

Input is an aut_Keyf rameData object (ground truth data only), output is an 
aut__MappingResult object. 

autJVlappingResult 

A container class to store the mapping parameters comprised of the mapping parameter 
matrix P , the definition of the basis functions B(a) (animation tag expressions) and the 
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calibration vector c . Contains code to compute the calibrated target mix vector g c . Additional 
annotation information is contained to associate target names to target mix vector components. 

aut__computeAnimTagTraf o 

A function to perform the multiplication of the target matrix, calibration vector and 
mapping parameter matrix to obtain the decoder matrix. Input to the function is an 
aut_Keyf rameData object (to obtain the target matrix, does not need ground truth data) and 
an aut_MappingResult object. The output is an att_AnimTagTraf o object as described 
below. 

att_AnimTagTrafo 

A container class to hold the decoder parameters, comprised of decoder matrix, offset 
vector and definition of basis functions (animation tag expressions). Functions provided are: 
computation of the model parameterization given an arbitrary animation vector and transform of 
the model parameterization into a different coordinate system. 

A description of internal components follows: 
aut__LabeledFltMatif and derived classes 

aut_LabeledFltMatif is a pure interface to a "labeled float matrix". Each row and 
column of the matrix is annotated with strings to identify the data stored in the matrix. 
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Read/write access can be done conveniently by providing names instead of indices. An actual 
implementation of the container is aut_LabeledFltMat , which is used to store intermediate 

data such as animation vectors. 

aut__ExprExpLFMif is a proxy to a aut_LabeledFltMatif and provides the 

functionality to compute mathematical expressions of matrix elements. This is used to compute 
basis functions/animation tag expressions. Instead of supplying the name of a matrix 
component, an expressions is passed. 

aut_MapBuilder and derived classes 

provide functionality to compute mapping algorithm parameters. The derived class 
aut_LinearMapBuilder is used in aut__LinTargetMapper. The class is configured on the 

fly from the parameters of aut jjinTargetMapper. 
aut_Mapping and derived classes 

Container classes for mapping algorithm parameters with function to compute mapping. 
One aut jviapping derived class is associated to each aut_MapBuilder derived 

class. 

aut_LinearMapping is similar to aut jviappingResult, which is the older 
implementation. It is used internally in aut_LinTargetMapper. Data is then copied into an 
au t __Mapp i ngRe suit. 
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aut__GroundTruthCompiler 

Provides functionality to compute animation vectors from ground truth data stored in an 
aut_Keyf rameData object. The output is stored in an aut__LabeldFltMat object for use in 

the aut_MapBuilder classes. 

A description of the process follows. A human anatomy reference model with 
targets is created in 3D studio Max. Ground truth anchors (spheres named GT_xx ) are added 
to the model to mark the position of tracking nodes. The model is then exported as VRML. The 
VRML file can be read using the plu_VRMLoader (library PlayerUtils). Target and ground truth 
data is extracted using the SceneGraph functions and stored into an aut_KeyframeData. 
Ground truth data exists only for the first target in the VRML file. The function 
autj<eyf rameData:: setGTDataViaAnchorPosQ is used to trace the vertices marked by the 
ground truth anchors and thus generate ground truth for all targets. 

The aut__KeyframeData object is then passed to a properly configured 
autJJnTargetMapper to compute the aut_MappingResult, which can be written to a file. The 
aut_MappingResult created in this way can be used with all models containing targets that 
model the same expressions, be it human or cartoonish. If decoder data is to be generated for 
another model, a aut_KeyframeData object has to be prepared from that model. It does not 
need to contain the ground truth data since this is only used to compute the aut_MappingResult. 
If the decoder data is generated for the reference model, the original aut_KeyframeData (see 
above) can be used. It is important that all model parameters that change from neutral to any 
target are stored in the autJ<eyframeData. This usually includes all vertex positions and the 
position/angles of the bottom teeth object. 

The aut_KeyframeData and the autJVIappingResult are then merged into a 
att_AnimTagTrafo using the function aut_computeAnimTagTrafo. 

The att_AnimTagTrafo contains the decoder configuration parameters that are then 
exported into the different content files (Pulse/Shout). 
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The facial animation values or tags may be displacement values relative to neutral face 
values. Advantageously, 8 to 22 (or more) facial animation values may be used to define and 
animate the mouth, eyes, eyebrows, nose, and the head angle. Representative facial animation 
values for the mouth may include vertical mouth position, horizontal mouth position, mouth 
width, lip distance, and mouth corner position (left and right). 

Morphing of a texture map on a deformed three-dimensional head mesh is described in 
U.S. patent number 6,272,231, titled WAVELET-BASED FACIAL MOTION CAPTURE FOR 
AVATAR ANIMATION. Imaging systems for acquiring images and image mapping are 
described in U.S. patent application serial number 09/724,320, titled METHOD AND 
APPARATUS FOR RELIEF TEXTURE MAP FLIPPING. The entire disclosures of U.S. patent 
number 6,272,231 and U.S. patent application serial number 09/724,320 are incorporated herein 
by reference. 

Although the foregoing discloses the preferred embodiments of the present invention, it is 
understood that those skilled in the art may make various changes to the preferred embodiments 
without departing from the scope of the invention. The invention is defined only the following 
claims. 
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